The Story of WWW-Search-MSN and WWW-Search-AOL

Shlomi Fish on 2006-07-07T18:16:26

I've been on Freenode's #perl the other day (before I had a permanent job) when someone asked if anybody was interested in writing a new CPAN module (to be released as open source) for a small pay. I replied that I'm interested, and we ended up talking in PM-mode.

That module was WWW-Search-MSN, which is a WWW-Search backend for searching using the search.msn.com engine. It was released on CPAN, and the copyright notice says it was originally written as a commission from Deviate Media and Red Tree Media.

After that, they asked me if I'd also like to write WWW-Search-AOL for an extra and identical fee. So I said I would, and wrote it. I recently received both payments to my PayPal account, so now I have a little money there. (I already used some of it to renew my Linux Weekly News subscription.) The fee was not too big, but enough to compensate for the time I spent on it. (Even though unsurprisingly SLOCCount still claims they would have cost more for a software house to develop).

Now for some insights about the code itself. I got the HTML parsed and could process it using HTML-Tree. The standard methods (look_up, look_down, etc.) were relatively convenient and I could whip up a working version pretty quickly. Only problem is that later on it broke on various edge cases, when the HTML returned by the search engines was somewhat differnet (like when no results are returned). I have been too lazy to catch all undef returns from the HTML-Tree calls, and so the code often breaks on such cases.

There are already two bugs for WWW-Search-MSN, one of them about the "Encode" dependency (which I didn't realise entered the core only after 5.6.1), and one about it dying on invalid input. MSN has a SOAP service for accessing results of their search, but SOAP is scary, and SOAP::Lite much more so. The guy who commissioned me to write the module tried to write something for that, and failed miserably. I do wish there was something simpler than SOAP for it like well-formed YAML or JSON input, which would have been a cake to implement in Perl. XML-RPC would also probably be better than SOAP for that.

As for the HTML - the search.msn.com HTML seemed very clean (at least after passing through HTML tidy) and was easy to analyse. search.aol.com seems like a Web 2.0-ish skin for Google, which was harder to analyse, but also not impossible. It might have a web service too.

While we're on the subject of getting some perks, someone I know online bought me "Perl Best Practices" by Damian Conway and "The Princess Bride" (The Book) from my Amazon.com wishlist. I've already started reading "Perl Best Practices" before I'm going to sleep. So far, the best practices contain useful and thought-provoking advice, even if I sometimes have my reasons for disagreeing with them, or thinking otherwise.